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EN THE UNITED STATES 
PATENT AND TRADEMARK OFFICE 



Patent AppMcsittiiom 



Inventor(s) 



Filing Date 

Examiner 

Title 



Mark Beutnagel Case Name Beutnagel 4-1-13-3 

Ariel Fischer 
Joem Ostennann 
Yao Wang 

12/31/1998 Serial No. 09/224,583 

Michael Opsasnick Group Art Unit 2654 

Integration of Talking Heads and Text-to-Speech Synthesizers for Visual TTS 



ASSISTANT COMMISSIONER FOR PATENTS 

WASHINGTON, D.C. 20231 
SIR: 



DECLARATION UNDER 37 CFR 1.131 



1. With reference to US Patent 6,18 1,351, which was filed on April 13. 1998,Ihereby 
declare the following: 

2. My co-inventors and I have invented the subject matter claimed in the instant application 
prior to April 13. 1998. 

3. In support of this assertion, enclosed is a photocopy of a letter and an accompanying 
memorandum that was sent by a Vice President of AT&T Labs. Dr. L. Rabiner, to the IP 
Department, asking that a patent application be prepared. This letter is dated prior to 
April 13, 1998. 



Respectfully, 





Joem Ostermann 
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AT&T Labs - Research 



Subject: Patent Application 

Work Project No. 31 1615-5006 



date; 



from: L R. Rablner 
FP-D151 
■■ 973-360-B500 



Tom Restaino: 

LC Suite 3000 . 

Attached for patent consideration is a memo entitled "FAR definition syntax for 
TTS input." by Ariel Fischer, Yao Wang, Mari< Beutnagel and Joem OstenDann 
of AT&T Labs - Research. 

This document describes the syntax used to define FAP bookmark sequences as 
input to a TTS system. The purpose of adding this functionality Is to allow the 
control of facial animations (smile, sadness, ...) directly from the input text of the 
TTS. For this kind of animations, simply applying an FAP of a constant value 
and removing it after a certain amount of time does not give a realistic face 
motion. The proposal allows the user to design complex timing behavior, and 
thus to have a high level of freedom for defining the evolutton of the FAP 
amplitude. 

Thank you for your consideration. If you have any further questions, please 
contact Joem Ostemnann on Ext. 331 1 . - j,/J 



Att. 

Copy to 
Mark Beutnagel 
Ariel Fischer 
B. G. Haskell 
J. Ostermann 
Yao Wang 





L. R. Rablner 
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FAP definition syntax for TTS input 



This dociimem describes the syntas used «o define FAP boolanayk sequences ss input no o TTS $ystem. The 
puipose of adding this functionality is to allow the control of f&ci£j Emimationd (smild» s&dness, ...) directly 
from ihe input text of the TTS. For this kind of animations, simply applying m FAP of q con^nt valtis and 
removing it after a certain amount of lime doss not give a realistic face motion. The proposal dlowa the ussr lo 
design complex timing behavior, and thus to have a high level of freedom for defining the evohuion of the FAP 
amplitude. . 

The following figure shows the complete blockdiagram descnbing ihs integration of a proprietory TTS 
Synthesizer into an MPEG-4 face animation system. The FAP bookmarks defined by the usser in the input text 
of the TTS Stream are identified by ihe speech synthesizer axtd transmitted in ASCH format to the Phoneme to 
FAP converter. 




Figure I : Blockdiagfam showing the integration of & proprleKaiy 
TTS into an MPEG-4 face animation system 



The syntax of the bookmark sequences used to convey commands to the TTS system is the foHowingt repeated 

as many times as the user wants: 

<FAP ^ (FAPsclect) FAPval FAI?dMir> 

#: defined according lo the visualFCD> Annex C, Table 12-1 

FAPselecti defined according to Table 12-3 (expression select), in case — 2 

FAPsclect: defined according to Table 12-5 (viseme select)* in case # = 1 

FAPval: defined in units according to Table 12-1 

FAPdur: defined in ms 

The Phoneme/Bookmark to FAP Converter (Fig. I) is responsible for translating the FAP bookmarks defined by 
ihc user into an FAP stream that can be interpreted by the Face Renderer. 
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FAPvai defines the value of the FAP to be applied at ±e end of PAPdur. The value of the PAP at the begitming 
of the aciion (stanValue) depends on the previous value and can be equal to: 
0 if the FAP bookmark sequence is the first one with this FAP # 

- FAPval of the previous defined FAP ^wilh the same FAP f> if a time longer than the previous FAPdur has 
elapsed between the two FAP definitions. 

- The actual reached value due to the previous FAP definition if a time shorter than the previous FAPdur has 
etap$ed between the two FAP definitions. 

At the end of FAPdur. FAPva! is maintained until another FAP definition gives & new vsaue to reach. To reset 
the action, an FAP with FAPval equed to 0 must be defined. 

To avoid too many parameters for defining the evolution of the value between the beginning of the fiction and 
its end. the function that computes for each frame the value of the FAP to be sent to the fece animatiOB system 
is predefined. We implemented the following functions: 



Q): linear f (2):l-e-' / 0): ^f^^SST; / i^y-^^>^'^ 



All these function uses as input ihe starting value (determined as explsined before). FAPval and FAPdur. end 
thus is completely determined as soon as the FAP definition is known. After extensive subjective ^uattonMi 
turns out that the Hermite function of third order gives the best results, in term* of realistic l*®**^". Using 
Splines wiih more than one Hermite segment would increase tte flexibility for designing £'f;"^5«J«»'J,«^? 
require 10 have some knowledge with regards to values placed further m the text than the FAP defiaioon, which 
is a sienificant drawback for a real-time system. .... - * ^.s, 

The Hermite function of Uiird order enables one to match the tangent at the beginmog of a segraent with the 
tangent at the end of the previous segment, so that a smooth curve can be guarantied. 

The computation of the Hermite function requires 4 parameters es input, which are: the vdiM of the fast point 
of the cTe (siartValue). its tangent (startTangent), the value to be reached at the end of the curve (equal to 
FAPVai) and its tangent (always set to 0 in our Implementation). 

For each FAP». the first curve (due to FAPO bookmark^) has a startValue (startValuej^ equal to 0 and a 
StartTangent (startTangent,^) also equal to 0. The value for startTangent and startValue for i ?• 0 depends on the 
lime elapsed between FAP# bookmark^ and FAP# bookmarkt, (tj.i; i). 
If I,.,; i> FAPdur i,i then; 
startValuCi = FAPval,. i 

stariTangenti = 0 . j •...•/,« 

and ihe resulting amplitude of the FAP to be sent to the renderer is computed with equauon («.I). 

(4. 1 ): FAPAmpKO = startValue {it' - 3/* + 1)+ FAPval- {- 2/' + 3^0+ StartTangent- (r' - 2f* + 1) 

i ' 

with /g[0 l] 



FAPdur, is used to relocate and scale the time parameter, i, from (0 1] to [t, 1,+FAPduril with Ij being the 
when the word following FAP# bookmarki in the text is pronounced. Thus, equation (4.2) gives th< 
rendering time: 



(4.2) : Rendering time for FAPAmpl, (t) = ti + f^FAPdun 
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If t4.|;i< FAPdUTu then: 

startValuct«FAPAmpU,(tn;i/FAPcliirM) ^ ,^ . ,^ 

sta^^Tangen^ = Tangent n (t^, i/ FAPdur n) which is computed with equation (4.3): 

(4.3): TongOTf (0 ^ ^flr?Va/M^ • (Sr' ^ 6?)+ ^^jPVof • (- 6?^ + &)+ ^arlTmgeni^ - 4f 1) 
wjf/i fe[0 l] 

and the resulting amplitude of the FAP is aiain coHiputed with equation (4.1). 

TTic nejtt figure shows an emnple of a timing curve created with 3 boobnarks e^tisnces ff6r FAF 2 
(expression) and FAFScIcci 1 (joy). 



FAP value sent 




4000 



UOO 17C0 2000 3000 
<FAP 2 1 100 2C00>...,text«..<FAP2 1 130 6C0> ^text ..„«d?A?a 1 0 lOOO>»»»tettL,. 



Figure 2: Example of a thvung curve 
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